Robust Estimation of Resource Consumption for SQL Queries using Statistical Techniques

نویسندگان

  • Jiexing Li
  • Arnd Christian König
  • Vivek R. Narasayya
  • Surajit Chaudhuri
چکیده

The ability to estimate resource consumption of SQL queries is crucial for a number of tasks in a database system such as admission control, query scheduling and costing during query optimization. Recent work has explored the use of statistical techniques for resource estimation in place of the manually constructed cost models used in query optimization. Such techniques, which require as training data examples of resource usage in queries, offer the promise of superior estimation accuracy since they can account for factors such as hardware characteristics of the system or bias in cardinality estimates. However, the proposed approaches lack robustness in that they do not generalize well to queries that are different from the training examples, resulting in significant estimation errors. Our approach aims to address this problem by combining knowledge of database query processing with statistical models. We model resource-usage at the level of individual operators, with different models and features for each operator type, and explicitly model the asymptotic behavior of each operator. This results in significantly better estimation accuracy and the ability to estimate resource usage of arbitrary plans, even when they are very different from the training instances. We validate our approach using various large scale real-life and benchmark workloads on Microsoft SQL Server.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Resource Usage Estimation of Data Stream Processing Workloads in Datacenter Clouds

Real-time computation of data streams over affordable virtualized infrastructure resources is an important form of data in motion processing architecture. However, processing such data streams while ensuring strict guarantees on quality of services is problematic due to: (i) uncertain stream arrival pattern; (ii) need of processing different types of continuous queries; and (iii) variable resou...

متن کامل

A Statistical Approach Towards Robust Progress Estimation

The need for accurate SQL progress estimation in the context of decision support administration has led to a number of techniques proposed for this task. Unfortunately, no single one of these progress estimators behaves robustly across the variety of SQL queries encountered in practice, meaning that each technique performs poorly for a significant fraction of queries. This paper proposes a nove...

متن کامل

Case Mix Planning using The Technique for Order of Preference by Similarity to Ideal Solution and Robust Estimation: a Case Study

Management of surgery units and operating room (OR) play key roles in optimizing the utilization of hospitals. On this line Case Mix Planning (CMP) is normally applied to long term planning of OR. This refers to allocating OR time to each patient’s group. In this paper a mathematical model is applied to optimize the allocation of OR time among surgical groups. In addition, another technique is ...

متن کامل

The VC-Dimension of Queries and Selectivity Estimation Through Sampling

We develop a novel method, based on the statistical concept of the Vapnik-Chervonenkis dimension, to evaluate the selectivity (output cardinality) of SQL queries – a crucial step in optimizing the execution of large scale database and data-mining operations. The major theoretical contribution of this work, which is of independent interest, is an explicit bound to the VC-dimension of a range spa...

متن کامل

Robust Runtime Optimization and Skew-Resistant Execution of Analytical SPARQL Queries on Pig

We describe a system that incrementally translates SPARQL queries to Pig Latin and executes them on a Hadoop cluster. This system is designed to work efficiently on complex queries with many self-joins over huge datasets, avoiding job failures even in the case of joins with unexpected high-value skew. To be robust against cost estimation errors, our system interleaves query optimization with qu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • PVLDB

دوره 5  شماره 

صفحات  -

تاریخ انتشار 2012